Identification of a novel signature based on unfolded protein response-related gene for predicting prognosis in bladder cancer

Background The unfolded protein response (UPR) served as a vital role in the progression of tumors, but the molecule mechanisms of UPR in bladder cancer (BLCA) have been not fully investigated. Methods We identified differentially expressed unfolded protein response-related genes (UPRRGs) between BLCA samples and normal bladder samples in the Cancer Genome Atlas (TCGA) database. Univariate Cox analysis and the least absolute shrinkage and selection operator penalized Cox regression analysis were used to construct a prognostic signature in the TCGA set. We implemented the validation of the prognostic signature in GSE13507 from the Gene Expression Omnibus database. The ESTIMATE, CIBERSORT, and ssGSEA algorithms were used to explore the correlation between the prognostic signature and immune cells infiltration as well as key immune checkpoints (PD-1, PD-L1, CTLA-4, and HAVCR2). GDSC database analyses were conducted to investigate the chemotherapy sensitivity among different groups. GSEA analysis was used to explore the potential mechanisms of UPR-based signature. Results A prognostic signature comprising of seven genes (CALR, CRYAB, DNAJB4, KDELR3, CREB3L3, HSPB6, and FBXO6) was constructed to predict the outcome of BLCA. Based on the UPRRGs signature, the patients with BLCA could be classified into low-risk groups and high-risk groups. Patients with BLCA in the low-risk groups showed the more favorable outcomes than those in the high-risk groups, which was verified in GSE13507 set. This signature could serve as an autocephalous prognostic factor in BLCA. A nomogram based on risk score and clinical characteristics was established to predict the over survival of BLCA patients. Furthermore, the signature was closely related to immune checkpoints (PD-L1, CTLA-4, and HAVCR2) and immune cells infiltration including CD8+ T cells, follicular helper T cells, activated dendritic cells, and M2 macrophages. GSEA analysis indicated that immune and carcinogenic pathways were enriched in high-risk group. Conclusions We identified a novel unfolded protein response-related gene signature which could predict the over survival, immune microenvironment, and chemotherapy response of patients with bladder cancer.


Introduction
Bladder cancer (BLCA) is one of the most common genitourinary cancers and leads to unfavorable outcomes and unsatisfactory quality of life worldwide. Smoking and occupational exposures are closely associated with the tumorigenesis and progression of BLCA [1]. At present, surgical treatment remains the main therapeutic strategy for patients with BLCA. Despite several achievements in surgical equipment and systemic therapy, incidence and mortality of BLCA remain unacceptable [2]. Therefore, investigating the molecular mechanisms and identifying novel biomarkers are of enormous clinical significance to BLCA research and management.
The unfolded protein response (UPR), a prevalent biological phenomenon in eukaryotic cells, is an intracellular signaling pathway of adaptive cell self-protection [3]. Endoplasmic reticulum (ER) homeostasis is constantly threatened by physiological demands and pathological insults such as ER calcium depletion, hypoxia, altered glycosylation, nutrient deprivation, oxidative stress, and DNA damage, which can disrupt the protein folding process and subsequently result in accumulation of unfolded or misfolded proteins in the ER [3][4][5][6]. To dispose the accumulation of the unfolded or misfolded proteins, a series of signal transduction pathways, also collectively known as UPR, were activated in the ER, to maintain ER homeostasis through altering transcriptional and translational programs [7][8][9][10]. Several studies have confirmed that UPR has been implicated in the tumorigenesis owing to the rapid proliferation of tumor cells [5,11,12]. The research progress of targeted UPR in cancer therapy has also attracted increasing attention [7,[13][14][15][16].
In the present study, we investigated the mRNA expression profiles of UPR-related genes from several public databases and established a novel UPR-related gene signature for predicting the outcomes and chemotherapy sensitivity of patients with BLCA. Furthermore, we also found that the UPR-based signature was closely correlated with immune cell infiltration and key immune checkpoints.

Data sources
Unfolded protein response-related genes (UPRRGs) were obtained from UALCAN (ualcan.path.uab.edu/home) database. The mRNA sequencing and clinical data for patients with BLCA were downloaded from The Cancer Genome Atlas (TCGA, https:// portal. gdc. cancer. gov/) database and included data for 19 normal bladder samples and 414 BLCA samples. In addition, GSE13507 [17], containing 165 primary BLCA samples, was used for verification studies.

Identification of differentially expressed UPRRGs
We used limma package in R software to identify differentially expressed UPRRGs (DEUPRRGs) in the TCGA-BLCA dataset. UPRGs with |log2fold change (FC)|> 1 and false discovery rate (FDR) < 0.05 were considered significantly differentially expressed.

Gene ontology and KEGG analyses
To further explore the potential molecular mechanisms in which the DEUPRRGs were involved, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of DELRGs were performed with packages (ggplot2, org.Hs.eg.db, enrichplot, and clusterProfiler) in R software. P < 0.05 was considered statistically significant.

Protein-protein interaction (PPI) network
The STRING database (http:// www. string-db. org/) was applied to acquire protein-protein interaction (PPI) information related to DEUPRRGs. Cytoscape software (version 3.7.2) was utilized to establish and visualize the PPI network.

Construction and validation of the prognostic signature based on DEUPRRGs
To identify the DEUPRRGs associated with prognosis in patients with BLCA in the TCGA dataset, univariate Cox regression analysis was performed. Only DEUPRRGs with p < 0.05 were selected for subsequent construction of a risk signature with the least absolute shrinkage and selection operator (LASSO) Cox regression model through packages (glmnet and survival) in R software. The risk score was calculated with the following formula: risk scores= n i X i × Y i (where X is the coefficient of the prognosis-related DEUPRRGs and Y is the expression of the relevant gene). BLCA patients were categorized into two groups (high-risk and low-risk) based on the median risk score. Kaplan-Meier analysis was used to compare the survival of BLCA patients between the two groups. Time-dependent receiver operating characteristic (ROC) curves were applied to evaluate the reliability of the constructed signature for predicting prognosis. In addition, GSE13507 was used to verify the performance of this prognostic signature via the same method.

Construction of a nomogram
We investigated the correlation between clinicopathological characteristics and the prognostic signature. Univariate and multivariate Cox regression analyses were performed to ascertain whether this prognostic signature was an independent prognostic indicator. To predict the overall survival of BLAC patients at 3 and 5 years, we constructed a nomogram composed of risk scores and clinical variables with R package (rms). The consistency index was used to assess the accuracy of the nomogram. The calibration curve was utilized to visualize the performance of the nomogram.

Gene set enrichment analysis
Gene set enrichment analysis (GSEA) was conducted to further explore the potential molecular mechanisms among different groups. FDR < 5% and p < 0.05 was considered statistically significant.

Immune infiltration analyses
Given the importance of tumor immune microenvironment, ESTIMATE algorithm was conducted to evaluate the stromal score, ESTIMATE score, and immune score and investigate the relationship between risk score and tumor microenvironment. The Cell type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBER-SORT) algorithm was used to identify the immune cell fractions of 22 distinct leukocyte subsets between different groups. Furthermore, we also performed single-sample gene set enrichment analysis (ssGSEA) to evaluate the differences in immune-related pathways between the two groups via package (gsva) in R software. P values < 0.05 were considered statistically significant.

Chemotherapy sensitivity prediction
To explore the difference of chemotherapy sensitivity between different groups, we used GDSC database to estimate the half maximal inhibitory concentration (IC50) of chemotherapy drugs for predicting the sensitivity of chemotherapy drugs by using the package (pRRophetic). P < 0.05 were considered statistically significant.

Validation of seven prognostic genes
The mRNA expression and promoter methylation level of the seven UPR genes in the prognostic signature in normal tissues and cancer tissues were examined in the UALCAN database (http:// ualcan. path. uab. edu/ index. html).

Immune infiltrates analysis of seven prognostic genes
The Tumor IMmune Estimation Resource (TIMER, https:// cistr ome. shiny apps. io/ timer/) database was used to explore the relationship between the immune cell abundances and the seven prognostic genes in BLCA. To further validate the correlation of seven genes with immune cells markers, the Gene Expression Profiling Interactive Analysis (GEPIA, https:// gepia. cancer-pku. cn) database was used. Statistical analysis was conducted by Spearman's correlation.

Statistical analyses
All these statistical analyses were conducted in R, version 3.6.2. Gene expression, clinical characteristics, immune cell infiltration, and risk score were compared between different groups through the Wilcoxon test. The difference in survival among different groups was evaluated by Kaplan-Meier curve. Independent prognostic analysis was performed by univariate and multivariate Cox regression. P < 0.05 was considered statistically significant.

Differentially expressed UPRRGs and functions
We acquired 251 UPRRGs from the UALCAN database and analyzed the expression of these genes in BLCA samples compared with normal bladder tissues in the TCGA dataset. Forty-four UPRRGs were considered as differentially expressed genes in BLCA with the criteria of |log2 FC|> 1 and FDR < 0.05, including 21 downregulated genes and 23 upregulated genes (Fig. 1A, B). The results of GO analyses showed that the most enriched GO terms in the biological processes were response to topologically incorrect protein, response to unfolded protein, response to endoplasmic reticulum stress, cellular response to unfolded protein, and endoplasmic reticulum unfolded protein response ( Fig. 2A). In the cellular component category, the DEUPRRGs were mainly enriched in endoplasmic reticulum lumen, collagen-containing extracellular matrix, inclusion body, and chaperone complex. In the molecular function category, chaperone binding, unfolded protein binding, cAMP response element binding, structural constituent of eye lens, and glycolipid binding were enriched in these DEUPRRGs. In the KEGG pathway analyses, the results showed that the DEUPRRGs were mainly involved in protein processing in the endoplasmic reticulum, human T-cell leukemia virus 1 infection, the

PPI network
To better understand the roles of these DEUPRRGs in regulating BLCA, a PPI network was constructed through the utilization of the STRING database. Then we used Cytoscape software to analyze and visualize the PPI network, which contained 33 nodes and 57 edges (Fig. 3).

Construction and validation of a SLERGs-based signature
We used univariate Cox regression analysis to identify the DEUPRRGs closely correlated with the overall survival of BLCA patients. Eleven DEUPRRGs were selected for subsequent analyses (p < 0.05) (Fig. 4).We successfully constructed a prognostic model consisting of 7 DEUPRGs (CALR, CRYAB, DNAJB4, KDELR3, CREB3L3, HSPB6, and FBXO6) by performing LASSO Cox regression analysis based on the 11 candidate genes. The risk score was calculated as follows: Risk expression × 0.5633 − FBXO6 expression × 0.3616 + HSPB6 expression × 0.0096. BLCA patients were classified into high-and low-risk groups based on the median risk score. The Kaplan-Meier curves showed that patients in the low-risk group had better outcomes than those in the high-risk group (Fig. 5A). The area under the ROC (AUC) curve value of the constructed model was 0.710 at 5 years (Fig. 5D). The survival status of each patient was represented on dot plots, which showed that patients in the low-risk group had more favorable outcomes than those in the high-risk group (Fig. 5B). Figure 5C indicates that the high-risk group highly expressed CRYAB, DNAJB4, HSPB6, KDELR3, and CALR, while the low-risk group highly expressed FBXO6 and CREB3L3. We also verified the prognostic capacity of this prognostic signature in the Gene Expression Omnibus (GEO) dataset, showing consistent results between GEO dataset and TCGA dataset (Fig. 5E, F, H). These results indicated acceptable performance of the prognostic signature.

The correlation between the prognostic signature and clinical characteristics
We also explored the relationship between the prognostic signature and clinical characteristics. The results showed that the elevated risk score was closely correlated with several inferior clinical characteristics (TNM stage, T stage, N stage, and grade) (Fig. 8A, B). With an increase in risk score, the expression levels of CALR, CRYAB, DNAJB4, KDELR3, and HSPB6 were increased, while the expression levels of FBXO6 and CREB3L3 were decreased. In addition, the elevated expressions of CALR, CRYAB, DNAJB4, KDELR3, and HSPB6 were associated with inferior clinical characteristics, while the under-expressions of FBXO6 and CREB3L3 were associated with superior clinical characteristics. Furthermore, the signature was related to unfavorable prognosis in certain subgroups stratified by sex (male or female), age (> 65 years) or (≤ 65 years), N stage (N0), TNM stage (I-II) or (III-IV), grade (high), and T stage (T3-T4) (Fig. 9). However, the signature showed no differences in other subgroups of N stage (N1-N2-N3), grade (low), and T stage (T1-T2).

GSEA
We performed GSEA to explore the signaling pathways that might be closely enriched among two groups. The results of GSEA indicated that focal adhesion, dilated cardiomyopathy, ECM receptor interaction, TGF-β signaling pathway, GAP junction, pathway in cancer, calcium signaling pathway, MAPK signaling pathway, chemokine signaling pathway, GnRH signaling pathway, and hedgehog signaling pathway were mainly enriched in high-risk group (Fig. 10A). In contrast, oxidative phosphorylation, retinol metabolism, fatty acid metabolism, linoleic acid metabolism, tyrosine metabolism, PPAR signaling pathway, and Parkinson's disease were mainly enriched in low-risk group (Fig. 10B).

Immune cell infiltration analyses
To further investigate the relationship between risk score and immune cell infiltration, ESTIMATE algorithm was first performed. We found that stromal score, immune score, and ESTIMATE score were remarkably higher in high-risk group than those in low-risk group (Fig. 11A).
In addition, the results of the CIBERSORT algorithm showed the significant differences in the proportions of distinct leukocyte subsets between different groups. The proportions of CD8 + T cells, follicular helper T cells, and activated dendritic cells were higher in the low-risk group, whereas those of M0 macrophages, M2 macrophages, and neutrophils were higher in the high-risk group (Fig. 11B). Furthermore, the ssGSEA results indicated that a majority of immune-related pathways had higher enrichment scores in the high-risk group than in the low-risk group. Cytolytic activity, inflammation promotion, MHC class I, type I IFN response, and type II IFN response were not significantly different between the two groups (Fig. 11C). CIBERSORT results showed that immune suppressive cells were mainly enriched in high-risk group (Fig. 12A). To verify the immunosuppressive status of high-risk group, we further investigated the expression of immune suppressive molecules among different groups. The results indicated that immune suppressive molecules were elevated in high-risk group, suggesting that patients of high-risk group have inferior activities of anticancer immune response (Fig. 12B). Furthermore, we also analyzed the expression of key immune checkpoints (PDCD1, PD-L1(CD274), CTLA-4, and HAVCR2) and found that PD-L1, CTLA-4, and HAVCR2 were upregulated in high-risk group (Fig. 12C).
In addition, chemokines implicated in immunosuppressive process (IL-10, TGF-β1, TGF-β2, TGF-β3) were also elevated upregulated in high-risk group (Fig. 12D). All these results indicated that the unfavorable outcomes of high-risk patients might be owing to the immunosuppressive microenvironment.

TIMER analysis
The TIMER database was used to investigate the correlation between the mRNA expressions of CALR, CREB3L3, HSPB6, FBXO6, CRYAB, DNAJB4, KDELR3, and the abundances of immune cells in BLCA (Fig. 13). . Meanwhile, we found that seven UPRGs CNVs were correlated with several immune cells (Fig. 14). We also revealed the correlation between seven UPRGs and immune marker genes of immune cells by using GEPIA database. The results suggested that seven UPRGs were also related to immune marker genes using the GEPIA database ( Table 1).

Validation of the mRNA expression of 7 prognostic genes
UALCAN database was used to examine the mRNA expression and promoter methylation level of seven UPR genes (Fig. 15). The results of UALCAN database analysis showed that the mRNA expression levels of CALR, CREB3L3, FBOX6, and KDELR3 were increased in cancer tissues compared with normal tissues, while the mRNA expression levels of CRYAB, DNAJB4, and HSPB6 were decreased in BLCA. Furthermore, the elevated expressions of CALR, CREB3L3, FBOX6, and Table 1 The correlation between seven UPR genes and immune cell markers ***P < 0.001, **P < 0.01, *P < 0.05, ns  However, we did not find the correlation between the mRNA expression and promoter methylation level of DNAJB4. All these results were consistent with previous results.

Correlation between risk score and chemotherapy sensitivity
In the current research, we used GDSC database to predict the response to common chemotherapy drugs by estimating the differences of IC50 among different groups. We found that patients with high risk score were more sensitive to several chemotherapy drugs, including cisplatin, rapamycin, cyclopamine, and bleomycin, while patients with low risk score were more sensitive to metformin and methotrexate (Fig. 16). In addition, there was no obvious difference in sensitivity of gemcitabine, mitomycin C, and doxorubicin among two groups.

Discussion
Bladder cancer (BLCA) is a highly heterogeneous malignancy that develops via a complex multistep biological process. Accumulating evidence has demonstrated the crucial role of the UPR in the progression and outcome of various types of cancer, including BLCA [12,18,19]. The UPR was mainly mediated by three ER transmembrane Fig. 16 Chemotherapy sensitivity prediction between different groups protein sensors: inositol requiring kinase 1 (IRE1), protein kinase RNA-like endoplasmic reticulum kinase (PERK), and activating transcription factor 6 (ATF6) [20]. Silencing OTUB1 can markedly suppress the proliferation and migration of BLCA cells through activating ATF6 signaling [18]. Most of studies focused on the effect of UPR in cancer progression and metastasis, and few studies have investigated the prognostic value of UPRrelated genes in cancers, especially in BLCA. Recently, mounting evidence has confirmed that multiple genes model is being used to predict outcomes and therapeutic effect seems to have high credibility [21][22][23][24].
Hence, in the current study, we conducted comprehensive bioinformatics approach to investigate the expression patterns of 251 UPR-related genes in BLCA and their correlation with OS. We identified a prognostic signature consisting of 7 UPRGs correlated with progression and outcomes of patients with BLCA and verified the novel prognostic signature in an external dataset GSE13507. Univariate and multivariate Cox regression analysis suggested that the signature based on 7 UPRGs was independent prognostic factor for survival in BLCA patients. Multivariate ROC analysis showed that risk scores were superior to traditional prognostic factors in predicting outcomes of patients with BLCA. In addition, UPR-based signature was closely associated with inferior clinical characteristics. Subgroups analyses stratified by clinical characteristics also indicated that patients with low risk scores had longer survival than those with high risk scores. A nomogram incorporating clinical characteristics and risk score was established to predict the outcomes of BLCA patients.
Among the seven UPRGs in the prognostic signature that we established, calreticulin (CALR), with a high ability for Ca 2+ binding, has been implicated in the homeostasis of Ca 2+ and protein folding as well as cell adhesion [25]. Several studies have also shown that CALR was associated with immune responses and apoptosis [26,27]. Mounting evidence suggested that CALR expression was correlated with carcinogenesis, progression, outcome, and drug resistance as well as epithelial-mesenchymal transition [28]. Alpha-crystallin B (CRYAB), a member of the small heat-shock protein family with the functions of suppressing protein aggregation and promoting protein refolding, has been reported to inhibit the tumorigenic phenotypes of BLCA cells through suppression of the PI3K/AKT and ERK signaling pathway [29]. DnaJ homolog subfamily B member 4 (DNAJB4), also called HLJ1, has been proven to serve as a tumor suppressor role in many types of cancer including lung cancer, melanoma, and breast cancer [30][31][32]. Genetic deletion of KDELR3 was found to inhibit the lung metastasis of melanoma cells [33]. CREB3L3, an ER stress-associated transcription factor, was shown to be related to the proliferation of HBV-associated HCC cells by regulating the PI3K/Akt and AMPK signaling pathways [34]. Studies have confirmed the critical function of HSPB6 in the endothelial proliferation and migration of various cancers [35,36]. Overexpression of HSPB6 suppressed BLCA T24 cell migration to a certain level but had no effect on T24 cell proliferation [37]. F-box protein 6 (FBXO6), a vital component of the ubiquitin protein ligase complex, has been reported to be involved in the invasion and sensitivity of cisplatin [38]. Wang et al. reported that FBXO6 might be a CD8 + T cell infiltration-promoting factor and improve anti-PD1 drug resistance in BLCA patients [39].
Previous studies have confirmed that UPR was highly correlated with immune cell infiltration. In addition, accumulating evidence has also confirmed that immune cell infiltration was closely associated with the development, progression, and prognosis as well as the treatment of BLCA. Therefore, the stromal score, ESTIMATE score, and immune score of BLCA samples were estimated by making use of the ESTIMATE algorithm. We found that the risk score was positively correlated with immune score, stromal score, and ESTIMATE score. Subsequently, we further used the CIBERSORT algorithm and ssGSEA to explore the relationship between the prognostic signature and immune cell infiltration. The results indicated that the signature based on 7 UPRGs was closely associated with immune cell infiltration. The low-risk group had a higher proportion of CD8 + T cells than high-risk groups, which was in line with previous research showing that the infiltration level of CD8 + T cells was positively related to favorable prognosis [39,40]. In contrast, M2 macrophages and neutrophils had higher proportions in the high-risk group than low-risk group, which was also in accordance with previous studies that M2 macrophages and neutrophils were related to unfavorable outcomes [41][42][43]. In addition, cytokines (IL-10, TGF-β1, TGF-β2, TGF-β3) involved in immunosuppressive process and immune checkpoints (PD-L1, CTLA-4, and HAVCR2) were elevated in high-risk group, which indicated the immunosuppression microenvironment in high-risk group. These results exhibited the potential of UPR signature in predicting immune microenvironment of BLCA, which might improve the immunotherapeutic effect of tumor.
We further investigated the potential mechanisms among different groups and found that immune and carcinogenic pathways were enriched in high-risk group, which might explain the unfavorable prognosis and immune-suppressive status of patients in high-risk group, while the metabolism-associated pathways were highly enriched in low-risk groups. In addition, patients with high risk score might benefit from several chemotherapy drugs, including cisplatin, rapamycin, cyclopamine, and bleomycin, while patients with low risk score might benefit from the treatments of metformin and methotrexate.
In brief, UPR plays a critical role in the occurrence and progression of tumors. Our study demonstrated the value of a set of UPRGs as a prognostic biomarker in BLCA. Nevertheless, as a retrospective study performed through bioinformatics analyses, this study inevitably has several disadvantages. In addition, the biological mechanisms of these prognostic UPRGs remain to be further confirmed by relevant experiments and clinical evidence.

Conclusion
In summary, we established and validated a risk model based on seven UPRGs for predicting the prognosis, immune microenvironment, and chemotherapy response of patients with BLCA, which might have reliable potential for clinical application in BLCA.